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Abstract 

In this paper, we consider a generalized longest common subsequence problem, in which a constraining 
sequence of length s must be included as a substring and the other constraining sequence of length t must 
be excluded as a subsequence of two main sequences and the length of the result must be maximal. For 
the two input sequences X and Y of lengths n and m, and the given two constraining sequences of length 
s and t, we present an 0{nmst) time dynamic programming algorithm for solving the new generalized 
longest common subsequence problem. The time complexity can be reduced further to cubic time in a 
more detailed analysis. The correctness of the new algorithm is proved. 


1 Introduction 

The longest common subsequence (LCS) problem is a well-known measurement for computing the similarity 
of two strings. It can be widely applied in diverse areas, such as file comparison, pattern matching and 
computational biology |3101 [H [S]- 

Given two sequences X and Y, the longest common subsequence (LCS) problem is to find a subsequence 
of X and Y whose length is the longest among all common subsequences of the two given sequences. 

For some biological applications some constraints must be applied to the LCS problem. These kinds of 
variant of the LCS problem are called the constrained LCS (CLCS) problem. Recently, Chen and Chao[T] 
proposed the more generalized forms of the CLCS problem, the generalized constrained longest common 
subsequence (GC-LCS) problem. For the two input sequences X and Y of lengths n and m,respectively, and 
a constraint string P of length r, the GC-LCS problem is a set of four problems which are to find the LCS 
of X and Y including/excluding P as a subsequence/substring, respectively. 

In this paper, we consider a more general constrained longest common subsequence problem called STR- 
IC-SEQ-EC-LCS, in which a constraining sequence of length s must be included as a substring and the other 
constraining sequence of length t must be excluded as a subsequence of two main sequences and the length 
of the result must be maximal. We will present the first efficient dynamic programming algorithm for solving 
this problem. 

The organization of the paper is as follows. 

In the following 4 sections we describe our presented dynamic programming algorithm for the STR-IC- 
SEQ-EC-LCS problem. 

In section 2 the preliminary knowledge for presenting our algorithm for the STR-IC-SEQ-EC-LCS prob¬ 
lem is discussed. In section 3 we give a new dynamic programming solution for the STR-IC-SEQ-EC-LCS 
problem with time complexity 0{nmst), where n and m are the lengths of the two given input strings, and 
s and t the lengths of the two constraining sequences. In section 4 the time complexity is further improved 
to 0{nmt). Some concluding remarks are in section 5. 
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2 Characterization of the STR-IC-SEQ-EC-LCS problem 

A sequence is a string of characters over an alphabet A subsequence of a sequence X is obtained by 
deleting zero or more characters from X (not necessarily contiguous). A substring of a sequence A is a 
subsequence of successive characters within X. 

For a given sequence X = xiX 2 ■ • ■ Xn of length n, the ith character of X is denoted as Xi € for 
any i = 1, ■■■ ,n. A substring of X from position i to j can be denoted as X[i : j] = XiXi+i ■ • -Xj. If 
i ^ 1 OT j ^ n, then the substring X[i : j] = XiXi+i • ■ - xj is called a proper substring of X. A substring 
X[i : j] = XiXi+i • ■ • Xj is called a prefix or a suffix of A if i = 1 or j = n, respectively. 

An appearance of sequence A = xiX 2 ■ • ■ Xn in sequence Y = yiy 2 ■ • ■ ym, for any A and Y, starting at 
position j is a sequence of strictly increasing indexes ii,i 2 , ■ ■ ■ ,in such that ii = j, and A = j/ii, j/ia, • • •, ■ 

A compact appearance of A in A starting at position j is the appearance of the smallest last index i„. A 
match for sequences A and A is a pair (i,j) such that Xi = yj. The total number of matches for A and A 
is denoted by S. It is obvious that S < nm. 

For the two input sequences A = xiX 2 • ■ • Xn and A = of lengths n and m, respectively, 

and two constrained sequences P = piP 2 • ■ • Ps and Q = qiq 2 • ■ • qt of lengths s and t, the SEQ-IC-STR-IC- 
LCS problem is to find a constrained LCS of A and A including P as a substring and excluding Q as a 
subsequence. 

Definiton 1 Let Z(i,j,k,r) denote the set of all LCSs of X[1 : i] and A[I : j] such that for each z € 
Z(i,j,k,r), z includes P[I : k] as a substring, and excludes Q[1 : r] as a subsequence, where 1 < i < n,l < 
j < rn,0 < k < s, and 0 <r <t. The length of an LCS in Z(i,j,k,r) is denoted as g(i,j,k,r). 

Definiton 2 Let W{i,j,k,r) denote the set of all LCSs of X[1 : i] and A[I : j] such that for each w G 
W{i,j, k,r), w excludes Q[1 : r] as a subsequence, and includes P[1 ■. k] as a suffix, where I<i<n, I<j< 
m,0 < k < s, and 0 <r <t. The length of an LCS in W(i,j,k,r) is denoted as f{i,j,k,r). 

Definiton 3 LetU(i,j,k) denote the set of all LCSs ofX[i : n] andY[j : m] such that for eachu G U{i,j,k), 
u excludes Q[k : t] as a subsequence, where 1 < i < n, \ < j < m,0 < k < t. The length of an LCS in 
U{i,j,k) is denoted as h(i,j,k). 

Definiton 4 LetV{i,j,k) denote the set of all LCSs of X[1 : i] andY[l : j] such that for each v G V{i, j,k), 
V excludes Q[1 ■. k] as a subsequence, where I<f<n, I<j < m,Q < k < t. The length of an LCS in 
V(i,j,k) is denoted as v{i,j,k). 

The following theorem characterizes the structure of an optimal solution based on optimal solutions to 
subproblems, for computing the LCSs in W{i, j,k,r), for any l<i<n, l<j<m,0<fc<s, and 0 < r <t. 

Theorem 1 If Z[1 : 1] = zi, Z 2 , ■ ■ •, zi G W{i, j, k, r), then the following conditions hold: 

1. If i,j, k > 0,r = 1, Xi = pj = pk = qr, then zi Xi and Z[1 : 1] G W{i — 1, j — 1, k, r). 

2. If i,j,k > 0,r > 1, Xi = yj = Pk = qr, then zi Xi implies Z[1 : 1] G W{i — 1, j — l,k,r); Zi = Xi 

implies Z[1 : I — 1] GW{i — l,j — \,k — \,r — \). 

3. If i,j,k > 0, Xi = pj = Pk and r > 0,Xi qr or r = 0, then Zi = Xi = pj = pk and Z[1 : I — I] G 
W{i-l,j-l,k-l,r). 

4- If i, j, k > 0, Xi = Pj and Xi then zi Xi and Z[1 : 1] GW(i — l,j — 1, k, r). 

5. Ifi,j>0,k = 0,r = l,Xi=yj = qr, then zi Xi and Z[1 : /] G W{i — I, j — I, fc, r). 

6. If i,j > 0,k = 0,r > 1, Xi = yj = qr, then zi Xi implies Z[1 : 1] G W{i — 1, j — l,k,r); zi = Xi 

implies Z[1 : I — 1] GW{i — l,j — l,k,r — l). 

7. Ifi,j>0,k = 0,Xi = yj and r > 0,Xi qr orr = 0, then zi = Xi and Z[1 : / — I] G W{i — l,j — l,k,r). 
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8. If i,j > 0,Xi Uj, then zi ^ Xi implies Z[1 : 1] GW{i — l,j, k,r). 

9. Ifi,j > 0,Xi ^ pj, then zi ^ pj implies Z[1 : 1] S W{i,j — l,k,r). 

Proof. 

1. In this case, if Xi = zi, then Z[1 : 1] includes Q[1 : r], a contradiction. Therefore, we have Xi ^ zi, and 
Z\V : 1] must be an LCS of X[1 : i — 1] and y[l : j — 1] including P[1 : A:] as a suffix and excluding Q[1 : r] 
as a subsequence, i.e. Z[1 : 1] € W{i — l,j — l,k,r). 

2. There are two subcases to be distinguished in this case. 

2.1. If zi = Xi, then Z[1 : ? — 1] is a common subsequence of X[1 : i — 1] and Y[1 : j — 1] including 

P[1 : fc — 1] as a suffix and excluding Q[1 : r — 1] as a subsequence. We can show that Z[1 : I — 1] is an LCS 
of X[1 : z — 1] and F[1 : j — 1] including P[1 : fc — 1] as a suffix and excluding (5[1 : r — 1] as a subsequence. 

Assume by contradiction that there exists a common subsequence a of A[1 : z — 1] and Y[1 : j — 1] including 

P[1 : fc — 1] as a suffix and excluding Q[1 : r — 1] as a subsequence, whose length is greater than I — 1. Then 
the concatenation of a and zi will result in a common subsequence of A[1 : z] and Y[1 : j] including P[1 : k] 
as a suffix and excluding Q[1 : r] as a subsequence, whose length is greater than 1. This is a contradiction. 
Therefore, in this case we have Z[1 : I — 1] &W{i — l,j — l,k — l,r — 1). 

2.2. If zi ^ Xi, then Z[1 : I] must be an LCS of A[1 : z — 1] and F[1 : j — 1] including P[1 : fc] as a suffix 

and excluding Q[1 : r] as a subsequence, i.e. Z[1 : 1] € W{i — 1, j — l,k,r). 

3. In this case, we have no constraints on Q, provided r > 0,Xi Pr or r = 0. Therefore we have 

Xi = pj = pk = zi- It is obvious that Z[1 : I — 1] is a common subsequence of A[1 : z — 1] and F[1 : j — 1] 
including P[1 : k — 1] as a suffix and excluding (5[1 : r] as a subsequence. We can show that Z[1 — 1] is an 

LCS of A[1 : i — 1] and F[1 : j — 1] including P[1 : fc — 1] as a suffix and excluding (5[1 : r] as a subsequence. 
Assume by contradiction that there exists a common subsequence a of A [1 : i — 1 ] and T[1 : j — 1] including 
P[1 : fc — 1] as a suffix and excluding (5[1 : r] as a subsequence, whose length is greater than I — 1. Then the 
concatenation of a and zi will result in a common subsequence of X[1 : z] and T[1 : j] including P[I : k] as 
a suffix and excluding Q[1 : r] as a subsequence, whose length is greater than 1. This is a contradiction. 

4. In this case, since Xi = pj ^ pk, we have Xi ^ zi, otherwise Z[I : 1] will not including P[1 : A:] as a 

suffix. Therefore, Z[1 : 1] must be an LCS of X[1 : z — 1] and Y[1 : j — 1] including P[I : A:] as a suffix and 

excluding Q[1 : r] as a subsequence, i.e. Z[1 : 1] G W{i — l,j — l,k,r). 

5. Since Xi = pj = qi and r = I, we have Xi zi, otherwise Z[1 : 1] will including Q[1 : r] as a 
subsequence. Therefore, Z[1 : 1] must be an LCS of X[1 : z — 1] and Y[1 : j — 1] including P[I : A:] as a suffix 
and excluding Q[1 : r] as a subsequence, i.e. Z[1 : 1] € W{i — 1, j — l,k,r). 

6 . There are two subcases to be distinguished in this case. 

6.1. If zi = Xi, then Z[1 : ? — 1] is a common subsequence of X[1 : z — 1] and Y[1 : j — 1] excluding 

Q[\ : r — 1] as a subsequence. We can show that Z[\ — 1] is an LCS of X[\ : z — 1] and Y[1 : j — 1] 

excluding Q[l : r — 1] as a subsequence. Assume by contradiction that there exists a common subsequence 
a of A [I : z — 1] and Y[1 : j — 1] excluding Q[1 : r — 1\ as a subsequence, whose length is greater than / — 1. 
Then the concatenation of a and zi will result in a common subsequence of A[I : z] and Y[1 : j] excluding 
Q[\ : r] as a subsequence, whose length is greater than 1. This is a contradiction. Therefore, in this case we 
have Z[1 ■. I — 1] (iW{i — l,j — \,k,r — 

6.2. If zi ^ Xi, then Z[1 : 1] must be an LCS of X[1 : z — 1] and Y[1 : j — 1] excluding Q\1 : r] as a 
subsequence, i.e. Z[1 : 1] G W{i — l,j — l,A:,r). 

7. Since Xi = pj and r > 0, a;^ 7 ^ or r = 0, we have Zi = Xi, and Z[I : ^ — 1] is a common subsequence of 
X[1 : z — I] and T[1 : j — 1] excluding Q[1 ■. r] as a subsequence. We can show that Z[1 : / — 1] is an LCS of 
A[1 : z — 1] and Y[1 : j — 1] excluding (5[I : r] as a subsequence. Assume by contradiction that there exists 
a common subsequence a of A [I : z — 1] and Y[\ : j — 1] excluding Q[1 : r] as a subsequence, whose length is 
greater than I — 1. Then the concatenation of a and zi will result in a common subsequence of A [I : z] and 
Y[1 : j] excluding Q[1 : r] as a subsequence, whose length is greater than 1. This is a contradiction. 

8 . Since Xi 7 ^ pj and zi ^ Xi, Z[1 : 1] must be a common subsequence of A [I : z — I] and Y[1 : j] including 
P[I : A;] as a suffix and including Q[1 : r] as a subsequence. It is obvious that Z[1 : 1] is also an LCS of 
A[1 : z — 1] and y[I : j] including P[1 : A;] as a suffix and including Q[1 : r] as a subsequence. 
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9. Since Xi ^ yj and zi ^ yj, Z[1 : 1] must be a common subsequence of X[1 : i] and Y[\ : j — 1] including 
P[1 : fc] as a sufRx and including Q[1 : r] as a subsequence. It is obvious that Z[1 : is also an LCS of 
X[\ : i] and Y[1 : j — 1] including P[1 : fc] as a suffix and including (5[1 : r] as a subsequence. 

The proof is completed. □ 


3 A simple dynamic programming algorithm 

Our new algorithm for solving the STR-IC-SEQ-EC-LCS problem consists of three main stages. The main 
idea of the new algorithm can be described by the following Theorem 2. 

Theorem 2 Let Z[1 : 1] = zi, Z 2 , • • ■, zi be a solution of the STR-IC-SEQ-EC-LCS problem, i.e. Z[1 : 1] G 
Z{n,m, s,t), then its length I = g(n,m, s,t) can be computed by the following formula: 

g{n,m,s,t)= max {f{i,j,s,r)-\-h{i-\-l,j-\-l,r)} ( 1 ) 

where f(i,j,s,r) is the length of an LCS in W{i, j, s,r) defined by Definiton 2, and h{i,j,r) is the length 
of an LCS in U{i,j,r) defined by Definiton 3. 

Proof. 

Since Z[1 : 1] G Z{n, m, s, t), Z[1 : 1] must be an LCS of X and Y including P as a substring, and excludes 
Q as a subsequence. Let the first appearance of the string P in Z[1 : 1] starts from position I' — s -\- 1 to I' 
for some positive integer s < L < I, i.e. Z[l' — s + 1 : /'] = P. 

Let 

r* = max {r|(5[l : r] is a subsequence of Z[1 : Z^]} 

l<r<t 

Since Z[1 : V] excludes Q as a subsequence, we have r* < t, and thus Z[1 : V] excludes Q[\ : r* + 1] as a 
subsequence. Eor the same reason, Z[l' 1 : 1] excludes Q[r* -\- 1 : t] as a subsequence. 

Let 


(**)J*) = nrin • ^'] is S' common subsequence of X[1 : i] and Y[1 : j]} 

Then, Z[1 : I'] is a common subsequence of X[1 : i*] and Y\1 : j*] including P as a suffix and excluding 
Q[1 : r* -\-1] as a subsequence. It follows from Definition 2 that 

I' <f{i*,f,s,r* + l) ( 2 ) 

Since Z[1 : Z] is a common subsequence of X and Y, Z[Z' + 1 : Z] must be a common subsequence of 
X[i* -I 1 : n] and Y[j* -\- 1 : to]. We have known Z[V -\- \ : Z] excludes Q[r* -\- \ : t] as a subsequence. 
Therefore, Z[l' + 1 : Z] is a common subsequence of X[i* + 1 : n] and Y[j* + 1 : to] excluding Q[r* + 1 : t] as 
a subsequence. It follows from Definition 3 that 

l-l' <h{i* + l,f + l,r* + 1) (3) 

Combining formulas @ and m we have, 

I < f{i*,r, S, r* + 1) + h{i* + l,r + 1, r* + 1) 


Therefore, 


/< max {f{i,j,s,r) + h{i + l,j+ l,r)} (4) 

On the other hand, for any a G W{i,j, s, r) and b G U{i + 1, j + 1, r), 1 < Z < n, 1 < j < to, 1 < r < t, 
then c = a® 6, the concatenation of a and b, must be a common subsequence of X[1 : n] and Y[1 : to] 
including P as a substring. Eurthermore, we can prove c excludes Q as a subsequence. 
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In fact, let 

r* = max {r^|( 5 [l : r'] is a subsequence of a} 

0 <r'<t 

We then have r* < r, since a excludes Q[1 ■. r] as a subsequence. 

In this case, if c includes Q as a subsequence, then b must include Q[r* + 1 ■. t] as a subsequence. It 
follows from r* + 1 < r that b includes Q\r -.t] as a subsequence. This is a contradiction. 

Therefore, we have c = a is a common subsequence of X[1 : n] and Y[1 : m] including P as a 
substring and excluding Q as a subsequence, and thus |a06| < 1. That is: 


max 

l<2<n,l<j<m,l<r<i 


+ h{i + l,j + l,r)} < I 


Combining formulas (|3]) and ([5]) we have, 


(5) 


l<2<n,l<j<m,l<r<t 

The proof is completed. □ 

The first stage is to find LCSs in k, r). Let f{i,j, k, r) denote the length of an LCS in k, r). 

By the optimal substructure properties of the STR-IC-SEQ-EC-LCS problem shown in Theorem I, we can 
build the following recursive formula for computing f{i,j,k,r). For any I<*<n, l<j<m,0<fc<s, 
and 0 < r < t, the values of f(i,j, k, r) can be computed by the following recursive formula ((S]). 


max{/(i - I,j, fc,r),/(i,j - l,fc,r)} 

1 + /(i - 1, j - I,fc - I,r) 

/(* - 1,J - 

max {I + -l,k-l,r- !),/(* - I, j - l,k,r)} 

/{i - 1, j - ^,k,r) 

! + /(*- - ^,k,r) 

/(* - 1, j - ^,k,r) 

^ max {I + f{i -l,j -l,k,r- !),/(* - 1, j - l,k,r)} 


if Xi ^ yj 

if Xi = yj = pk A {r = 0 V Xi ^ Qr) 
a Xi = yj = pk = Qr Ar = 1 

a Xi = yj = pk = Qr A r > 1 

if i,j,k>0Axi = yj ^ pk 
ii k = Q A Xi = yj A [r = Qy Xi ^ qr) 

\i k = 0 A Xi = yj A {r = 1 A Xi = Qr) 

a k = 0 A Xi = yj = Qr A r > 1 

( 6 ) 


The boundary conditions of this recursive formula are /(I, 0,0,0) = /(0,j, 0,0) = 0 and f{i,0,k,r) = 
/(O, j, k, r) = —oo for any 0 < i < n,0 < j < m,l < k < s, and 1 < r < t. 

Based on this formula, our algorithm for computing f{i,j^k,r) is a standard dynamic programming 
algorithm. By the recursive formula o, the dynamic programming algorithm for computing f{i,j, k,r) can 
be implemented as the following Algorithm I. 

It is obvious that the algorithm requires 0(nmst) time and space. For each value of f(i,j, k, r) computed 
by algorithm Suffix, the corresponding LCS of A[1 : i] and y[l : j] including P[1 : fc] as a subsequence, 
and including Q [1 : r] as a suffix, can be constructed by backtracking through the computation paths from 
{i,j,k,r) to (0,0,0, 0). The following algorithm back{i, j,k,r) is the backtracking algorithm to obtain the 
LCS, not only its length. The time complexity of the algorithm back{i,j, k, r) is obviously 0{n + m). 

The second stage of our algorithm is to find LCSs in U{i,j,k). The length of an LCS in U{i,j,k) is 
denoted as h{i,j,k). Chen et al.[T] presented a dynamic programming algorithm with 0(nmt) time and 
space. A reverse version of the dynamic programming algorithm for computing h(i,j, k) can be described as 
follows. 

For each value of h{i,j,k) computed by algorithm SEQ-EC-R, the corresponding LCS of X[i : n] and 
Y[j : m] excluding Q[k : t] as a subsequence, can be constructed by backtracking through the computation 
paths from {i,j, k) to (0,0, 0). The following algorithm backr{i,j, k) is the backtracking algorithm to obtain 
the corresponding LCS, not only its length. The time complexity of the algorithm backr(i,j, k) is obviously 
0{n + m). 

By Theorem 2, the dynamic programming matrices f(i,j, k, r) and h(i,j, k) computed by the algorithms 
Suffix and SEQ-EC-R can now be combined to obtain the solutions of the STR-IC-SEQ-EC-LCS problem 
as follows. This is the final stage of our algorithm. 
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Algorithm 1 Suffix 

Input: Strings X = xi ■ • ■ Xn, V = yi ■ • ■ Vm oi lengths n and m, respectively, and two constrained sequences 
P = P 1 P 2 • ■ - Ps and Q = qiq 2 ■ ■ • qt of lengths s and t 

Output: f{i,j,k,r), the length of an LCS of X[1 : i] and Y[1 : j] including P[1 : A:] as a suf¬ 
fix, and excluding Q[1 : r] as a subsequence, for all 1 < i < n, 1 < j < m, 0 < /c < s, and 
0 < r < t. 

1 : for all i,j, k,r , 0 <i<n,0<j<m,0<k<s and 0 < r < t do 
2 : /(i,0,fc,r),/(0, j,fc,r) - 00 ,/(i, 0, 0,0),/(O, j, 0,0) ^ 0 {boundary condition} 

3: end for 

4: for all i,j, k,r , 1 < i < n,l < j < m,0 < k < s and 0 < r < t do 
5: if Xi ^ pj then 

6: f{i,j,k,r) ^ max{f{i - l,j,k,r),f{i,j - l,k,r)} 

7: else if fc > 0 and Xi = pk then 

8 : if r = 0 and Xi yf qr then 

9: f(i,j,k,r) ^1 + f(i- l,j -l,k-l,r) 

10 : else if r = 1 and Xi = qr then 

11: f{i,j,k,r) ^ -l,k,r) 

12: else 

13: f{i,j,k,r) ^ maxjl -h f{i - l,j -l,k-l,r- l),/(i - l,j - l,k,r)} 

14: end if 

15: else if fc = 0 then 

16 : ii r = 0 or Xi ^ qr then 

17: fii,j,k,r) ^1 + fii- l,j - l,k,r) 

18: else if r = 1 and Xi = qr then 

19: f{i,j,k,r) ^ -l,k,r) 

20: else 

21: f{i,j,k,r) ^ maxjl -h f{i - l,j -l,k,r- l),f{i - l,j - l,k,r)} 

22: end if 

23: else 

24: /(i,j,fc,r) /(i-l,j - l,fc,r) 

25: end if 

26: end for 
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Algorithm 2 back{i,j,k,r) 

Input: Integers fc, r 

Output: The LCS of A[1 : i] and F[1 : j] including P[1 : fc] as a sufhx and excluding Q[1 : r] as a 
subsequence 

1 : if I < 1 or j < 1 then 
2 ; return 
3; end if 

4: if Xi ^ Uj then 

5; if f{i - 1, > f{i,j - l,k,r) then 

6 ; back{i — l,j,k,r) 

7: else 

8 : back{i,j — l,k,r) 

9; end if 

10 : else if fc > 0 and Xi = pk then 
11 : if r = 0 and Xi ^ Qr then 

12 : back{i — l,j — l,k — l,r) 

13: print Xi 

14: else if r = 1 and Xi = Qr then 

15: back{i — l,j — l,k,r) 

16: else 

17: if 1 + f{i — l,j — l,k — l,r — 1) > f{i — l,j — 1, k, r) then 

18: back{i — 1, j — l,k — l,r — 1) 

19: print Xi 

20 : else 

21 : back{i — l,j — l,k,r) 

22 : end if 

23: end if 

24: else if fc = 0 then 

25: if r — 0 or Xi Qr then 

26: back{i — l,j — l,k,r) 

27: print Xi 

28: else if r = 1 and Xi = qr then 

29: back{i — l,j — l,k,r) 

30: else 

31: if 1 + f{i — l,j — l,k,r — 1) > f{i — l,j — l,fc,r) then 

32: back{i — 1, j — l,k,r — 1) 

33: print Xi 

34: else 

35: hack{i — 1, j — \,k,r) 

36: end if 

37: end if 

38: else 

39: hack{i — 1, j — \,k,r) 

40: end if 
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Algorithm 3 SEQ-EC-R 

Input: Strings X = xi ■ ■ • Xn, Y = yi • ■ • y-m of lengths n and m, respectively, and a constrained sequence 
Q = qiq 2 ■ ■ • qt of lengths t 

Output: h{i,j,k), the length of an LCS of X[i : n] and Y[j : m] excluding Q[k : t] as a subsequence, for all 
l<i<n, l<j<m,0<A:<t. 

1 ; for alH, j. A: , 0 < i < n, 0 < j < TO, 1 < A: < t do 
2 : h{i,m + l,k),h{n + l,j,k) < -oo {boundary condition} 

3: end for 

4: for i = n down to 1 do 
5: for j = m down to 1 do 

6: for A: = A + 1 down to 1 do 

7: if Xi ^ yj then 

8 : h{i, j, k) •(— max{Ai(t + l,j, k), h(i, j + 1, A:)} 

9: else 

10 : if A: > A or A; < A and Xi ^ qk then 

11 : Ai(A, j. A:)-5—1 + ft.(A + 1, j + 1, A:) 

12: else if Xi = qu then 

13: if A; = A then 

14: h{i,j,k) ■(— Ai(A + 1, j + 1, k) 

15: else 

16: k) ■(— max{l + h{i + 1, j + 1, fc + 1), h{i + 1, j + 1, k)} 

17: end if 

18: end if 

19: end if 

20 : end for 

21 : end for 

22 : end for 






Algorithm 4 backr{i,j,k) 

Input: Integers fc 

Output: The LCS oi X[i : n] and Y[j : m] including P[k : s] as a subsequence 
1: if i > n ov j > m then 
2: return 

3: end if 
4: if Xi ^ yj then 

5: ii h{i + 1, j,k) > h{i,j + l,k) then 

6 ; backr{i + l,j,k) 

7: else 

8 : backr(i,j + l,k) 

9: end if 

10: else 

11: if k > t or k < t and Xi ^ qk then 

12: print Xi 

13: backr{i + l,j + l,k) 

14: else if Xi = qk then 

15: if k = t then 

16: backr{i + l,j + l,k) 

17: else 

18: if h{i + 1 , j + 1 , A) > 1 + h{i + 1 , j + 1 , fc + 1) then 

19: backr{i + l,j + l,k) 

20: else 

21: print Xi 

22 : backr{i + l,j + l,k + 1) 

23: end if 

24: end if 

25: end if 

26: end if 
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Algorithm 5 STR-IC-SEQ-EC-LCS 

Input: Strings X = xi ■ ■ ■ Xn, Y = yi ■ ■ ■ ym of lengths n and m, respectively, and two constrained sequences 
P = P 1 P 2 • ■ - Ps and Q = qiq 2 ■ ■ • qt of lengths s and t 

Output: The constrained LCS of X and Y including P as a substring, and including Q as a subse¬ 
quence. 

1 : Suffix {computefc, r)} 

2 : SEQ-EC-R (compute h{i,j,k)} 

3: i*,j*,k*-<^0,tmp< - 00 

4; for i = 1 to n do 
5; for j = 1 to TO do 
6: for fc = 1 to t do 

7; X ^ f{ij,s,k) + h{i + l,j + l,k) 

8: if imp < X then 

9: imp x,i* ^ i,j* j,k* ^ k 

10: end if 

11: end for 

12: end for 

13: end for 

14: if imp > 0 then 
15: back{i*, j*, s,k*) 

16: backr(i* + j* + l,k*) 

17: end if 

18: return maxjO, trup}, P, j*, fc* 


From the ’for’ loops of the algorithm, it is readily seen that the algorithm requires 0{nmt) time. There¬ 
fore, the overall time of our algorithm for solving the STR-IC-SEQ-EC-LCS problem is 0{nmst). 

4 Improvements of the algorithm 

S. Deorowicz[3] proposed the first quadratic-time algorithm for the STR-IC-LCS problem. A similar idea 
can be exploited to improve the time complexity of our dynamic programming algorithm for solving the 
STR-IC-SEQ-EC-LCS problem. The improved algorithm is also based on dynamic programming with some 
preprocessing. To show its correctness it is necessary to prove some more structural properties of the problem. 

Let Z\1 : /] = zi, Z 2 , ■ • •, - 2 ; G Z{n^ to, s, t), be a constrained LCS of X and Y including P as a substring 
and excluding Q as a subsequence. Let also I = (ii, Ji), (* 2 , j 2 ), • • •, (ihji) be a sequence of indices of X and 
Y such that Z[1 : 1] = a;q, , • • •, and Z[1 : 1] = , 1 /^ 211 2/ji • From the problem statement, there 

must exist an index d G [1, Z — t -|- 1] such that P = xt ^, , • • •, and P = yj ^, , • • ■, yjd+a-i- 

Theorem 3 Let i'^ = id and for all e G [1, s — 1], i'd+f, be the smallest possible, but larger than i'd^^_i, index 
of X such that Xi^^^ = Xi'^^ . The sequence of indices 

^ (^ 15 P ): (^2, J 2 ), * * ‘ , {id—1, jd—l) , if d^ 3 ^ (^d-t-1: P-t-1) : * ‘ : i^d+s — 1 : jd+s—1^ , (id-\-s , jd-\-s ), * ‘ * , [fl-i jl) 

defines the same constrained LCS as Z[1 : /]. 

Proof. 

From the definition of indices it is obvious that they form an increasing sequence, since i'^ = id, and 
i'd+s-i — 'i'd+s-i- The sequence i'd,--‘ ,i'd+s-i course a compact appearance of P in A starting at id- 
Therefore, both components of I' pairs form increasing sequences and for any Xi^ = Therefore, 

/' defines the same constrained LCS as Z\1 ■. 1], 

The proof is completed. □ 
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The same property is also true for the jth components of the sequence I. Therefore, we can conclude 
that when finding a constrained LCS in Z{i,j,k,r), instead of checking any common subsequences of X 
and Y it suffices to check only such common subsequences that contain compact appearances of P both in 
X and Y. The number of different compact appearances of Q in X and Y will be denoted by Sx and 6y, 
respectively. It is obvious that SxSy < S, since a pair (i,j) defines a compact appearance of Q in X starting 
at ith position and compact appearance of Q in T starting at jth position only for some matches. 

Base on Theorem 2, we can reduce the time complexity of our dynamic programming from 0{nmst) to 
0{nmt). The improved algorithm consists of also three main stages. 

Definiton 5 For each occurrence i of the first character pi o/P[l : s] in X[1 : n], Ixi is defined as the index 
of the last character Ps of a compact appearance of P in X. If Xi pi or there is no compact appearance of 
P after i, then Ixi = 0. Similarly, for each occurrence j of the first character pi of P[1 : s] in Y[1 : m], Ipj 
is defined as the index of the last character Ps of a compact appearance of P in Y. 

In the first stage both sequences X and Y are preprocessed to determine two corresponding arrays lx 
and ly. 


Algorithm 6 Prep 
Input: X,Y 

Output: For each 1 < z < n, the minimal index r = Ixi such that X[i 
For each 1 < j < m, the minimal index r = Ipj such that 
quence 

1 ; for z = 1 to n do 
2: if Xi = Pi then 

3; Ixi ^ left{X,n,i) 

4; else 
5: Ixi i — 0 

6; end if 
7; end for 
8 : for j = I to m do 

9: if pj = Pi then 

10 : Ipj ^ left{Y,m,j) 

11: else 

12 : Ipj ^ 0 

13: end if 

14: end for 


: r] includes P as a subsequence 
Y[j : r] includes P as a subse- 


In the algorithm Prep, function left is used to find the index Ixi of the last character ps of a compact 
appearance of P. 

In the second stage two DP matrices of SEQ-EC-LCS problem are computed: h{i,j,k), the reverse one 
defined by Definition 3, and v{i,j, k), the forward one defined by Definition 4. Both of the DP matrices can 
be computed by the SEQ-EC-LCS algorithm of Chen et al.[T]. 

In the last stage, two preprocessed arrays lx and ly are used to determine the final results. To this end 
for each match (z, j) for X and Y the ends (Ixifipi) of compact appearances of P in A starting at position 
i and in Y starting at position j are read. The length of an STR-IC-SEQ-EC-LCS, g{n,m,s,t) defined by 
Definition I, containing these appearances of P is determined as a sum of three parts. Eor some indices 
z, j, fc,r, v{i — I, j — 1, k), the constrained LCS length of prefixes of X and Y ending at positions z — 1 and 
j — I, excluding (5[I : /c] as a subsequence, h{lxi + 1, lyj + 1, r) the constrained LCS length of suffixes of X 
and Y starting at positions Ixi + 1 and Ipj + 1, excluding Q[r : t] as a subsequence, and the constraint length 
s. The integers k and r have some relations. 
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Algorithm 7 left{X,n,i) 

Input: Integers n,i and X[1 : n] 

Output: The minimal index r such that X[i : r] includes P as a subsequence 
1 : Ct i — Z T 1, 6 i — 2 
2: while a < n and b < s do 
3: if Xa = Pb then 

4: 6 ^ + 1 

5; else 
6 : Ct i — U T 1 

7: end if 

8: end while 
9: if 6 > s then 
10: return a — 1 

11 : else 

12: return 0 

13: end if 


Algorithm 8 SEQ-EC 

Input: Strings X = xi ■ ■ ■ Xn, Y = yi ■ ■ ■ oi lengths 

n and 

TO, respectively, and a 

constrained sequence 

Q 

= qiq 2 -- -qt of length t 




Output: k), the length of an LCS of X[1 : i] and 

Y[1 : j] excluding (5[1 : k] as 

a subsequence, for all 

l<z<n, l<j<TO, 0 <A:<t. 




1 

for all z, j, fc , 0 < z < n, 0 < j < m, 1 < fc < t do 




2 

h{i,0,k),h{0,j,k) i -oo {boundary condition} 




3 

end for 




4 

for z = 1 to rz do 




5 

for j = 1 to TO do 




6 

for fc = 0 to t do 




7 

if Xi 7 ^ yj then 




8 

v{i,j,k) ^ max{z;(z - l,j,k),v{i,j - l,k)} 




9 

else 




10 

if k = 0 or k > 0 and Xi ^ qk then 




11 

v{i,j,k) c^l + v{i-l,j - l,k) 




12 

else if Xi = qk then 




13 

if fc = 1 then 




14 

v{i,j,k) ^ v{i -l,j - l,k) 




15 

else 




16 

v{i,j, k) <r- maxjl + v{i — l,j — l,k — 

l),v(z 

- 1 ,J - 


17 

end if 




18 

end if 




19 

end if 




20 

end for 




21 

end for 




22 

end for 
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Algorithm 9 backf(i,j,k) 

Input: Integers A: 

Output: The LCS of X[1 : i] and Y[1 : j] excluding Q[1 : fc] as a subsequence 
1: if t < 1 or j < 1 then 
2; return 
3; end if 
4; if Xi ^ Uj then 

5: a v{i — l,j,k) > v{i,j — l,k) then 

6: backr(i — l,j,k) 

7: else 

8: backr(i,j — l,k) 

9; end if 
10: else 

11: if fc = 0 or fc > 0 and Xi ^ qk then 

12: backr{i — j — \,k) 

13: print Xi 

14: else if Xi = qk then 

15: if A: = 1 or v{i — 1, j — 1, A;) > 1 + v{i — 1, j — 1, fc — 1) then 

16: backr{i — l,j — l,k) 

17: else 

18: backr{i — l,j — l,k — 1) 

19: print Xi 

20: end if 

21: end if 

22: end if 


Definiton 6 For each integer k,l < k < t, the index a{k) is defined as: 

a(k) = max \r\P includes Q\k : k-\-r — V\ as a subsequence} (7) 

0<r<s-k+l 

Since the constrained LCS A of prefixes of X and Y ending at positions i — 1 and j — 1, excludes Q[1 : k] 
as a subsequence, the concatenation of A and P will exclude Q[1 : r] as a subsequence, where r = A: + a{k). 
The constrained LCS B of suffixes of X and Y starting at positions Ixi + 1 and lyj + 1, excludes Q[r : t] as 
a subsequence. Therefore, the concatenation of A,P and B excludes Q as a subsequence. 

According to the matrices v{i,j, k) and h{i,j, k), backtracking can be used to obtain the optimal subse¬ 
quence, not only its length. 

Theorem 4 The algorithm STR-IC-SEQ-EC-LCS correctly computes a constrained LCS in Z{n,m,s,t)- 
The algorithm requires 0(nmt) time and to 0(nmt) space in the worst case. 

Proof. 

Let Z[1 : I] = zi, 02 , • • •, - 2 ; be a solution of the STR-IC-SEQ-EC-LCS problem, i.e. Z\\ : 1] € Z{n, m, s, t), 
and its length be denoted as I = g(ri, m, s, t). To prove the theorem, we have to prove in fact that 

q(n,m, s.t) = s + max {v(i — 1, j — \,k) + h(lxi + l,ly4-\-l.k + aik))} (8) 

l<i<n,l<j<m,0<k<t ' 

where h(i,j, k) is the length of an LCS in U{i,j, k) defined by Definiton 3, and v{i,j, k) is the length of 
an LCS in V(i, j, k) defined by Definiton 4. 

Since Z[1 : 1] € Z{n, m, s, t), Z[1 : 1] must be an LCS of X and Y including P as a substring, and excludes 
Q as a subsequence. Let the first appearance of the string P in Z[1 : 1] starts from position I' — s + 1 to I' 
for some positive integer s < I' < I, i.e. Z[l' — s -I- 1 : T] = P. 
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Algorithm 10 a{k) 

Input: Integers k 

Output: The maximum length r(0<r<s — /e + l) such that P includes P[k : fc + r — 1] as a subse¬ 
quence 

1 : Ct i — k^ h ^— 1, T i — 0 

2; while a < s and 6 < t do 
3 ; if Pa = qb then 
4 ; aa-\-l,rr + \ 

5: else 

6 : b ^h+l 

7: end if 

8: end while 
9 : return r 


Algorithm 11 STR-IC-SEQ-EC-LCS 

Input: Strings X = xi ■ • ■ Xn, V = yi ■ • ■ Pm of lengths n and m, respectively, and two constrained sequences 
P = P1P2 ■ ■ 'Ps and Q = qiq2 ■ ■ • qt of lengths s and t 

Output: The length of an LCS of X and Y including P as a substring, and excluding Q as a subse¬ 
quence. 

1: SEQ-EC {compute v{i,j,k)} 

2 : SEQ-EC-R {compute j, fc)} 

3: Prep {compute lx, ly} 

4: k*,r* 0,tmp 0 

5: for z = 1 to n do 

6; for j = 1 to m do 

7; if Ixi > 0 and lyj > 0 then 

8: for /c = 1 to t do 

9: r ^ k + a{k) 

10: c ^ v{i — l,j — 1, k) + h{lxi + 1, lyj -I- 1, r) -|- s 

11: if r > t then 

12: imp ^ 00 

13: end if 

14: if imp < c then 

15: imp ^ c,i* <r- i, j* ^ j, k* k,r* r 

16: end if 

17: end for 

18: end if 

19: end for 

20: end for 

21: if imp > 0 then 

22: backf{i* — l,j* — l,k*) 

23: print P 

24: backr{lxi» + f,lyj» -f l,r*) 

25: end if 

26: return max{0, tmp}, z*, j*, fc*, r* 
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Let 


r* = max {r|(5[l : r] is a subsequence of Z[1 : I' — s]} 

l<r<t 

Since Z[1 : — s] excludes Q as a subsequence, we have r* < t, and thus Z[1 : Z' — s] excludes Q[1 : r* + 1] 

as a subsequence. 

Let 

min '■ I' — s + 1] is a common subsequence of X[1 : i] and y[l;j]} 

Then, Xi^ = = pi = zj'-s+i, and xia;,, = yiy., = Ps = zi'- 

Therefore, Z[1 : Z' — s] is a common subsequence of X[1 : — 1] and y[l : j* — 1] excluding Q[1 : r* + 1] 

as a subsequence; Z[Z' + 1 : Z] is a common subsequence of X[lxi» + 1 : n] and Y[lyj* + 1 : to]. 

It follows from Definition 4 that 


Z'- s < u(i* - l,j* - l,r* + 1) (9) 

Since Q[1 : r*] is the longest prefix of Q in Z[1 : V — s], and 

Q;(r* + 1) = max {r\P includes Q[r* + l:r*+r] as a subsequence} 

0<r<s—r*+2 

we have, Z[1 : I'] includes Q[1 : r* + a{r* + 1)] as a subsequence. It follows from Z[1 : Z] excludes Q as 
a subsequence that Z[V + 1 : Z] excludes Q[r* + I + a{r* + 1) : t] as a subsequence. Therefore, we have 
Z[l' + 1 : Z] is a common subsequence of X[lxit + I : n] and Y[lyj» + 1 : to] excluding Q[r* + 1 + a{r* + 1) : i] 
as a subsequence. It follows from Definition 3 that 

I — I' < h[lxi» + I, lyj* + I, r* + I + a{r* + I)) (10) 

Combining formulas ([HI) and cni) we have, 

I — s < v{i* — l,j* — 1, r* + 1) + h{lxi* + 1, lyj* + 1, r* + 1 + a(r* + 1)) 


Therefore, 


Z<s+ max \v{i — l,j — l,k) + h{lxi + l,lyj + l,k + a(k))} (11) 

l<i<n,l<i<m,0<fe<t 

On the other hand, for any a £ V{i,j, k) and b G U{lxi + 1, lyj + 1, Zc + a{k)), l<Z<n, l<j<m, 1< 
Zc < f, let c = a 0 P 0 6. If Zxi > 0 and lyj > 0, then c must be a common subsequence of X[1 : n] and 
Y[1 : to] including P as a, substring. Furthermore, we can prove c excludes Q as a subsequence. 

In fact, since a excludes Q]! : Zc] as a subsequence, the length of the longest prefix of Q in a is at most 
Zc — 1, and thus the length of the longest prefix of Q in a 0 P is at most Zc — 1 + a(Zc). Since Zi is a common 
subsequence of X[lxi + 1 : n] and Y[lyj + 1 : to] excluding Q[k + a{k) : t] as a subsequence, we have, 
c = a0P06isa common subsequence oi X[1 : n] and T(1 : to] including P as a substring and excluding 
Q as a subsequence, and thus |a0P061 < Z. Therefore, 

s+ max {v{i — l,j — l,k) + h{lxi + l,lyj + l,k + a{k))} < I (12) 

1 ^ 2 Th ^ 1 272 ^ 0 /c "t 

Combining formulas (ED and ED we have, 

Z= max {v(i — 1, j — l,k)h{lxi + \,lyj + l,k + a{k))} 

1 2 72.1 272 ^ 0 /c i 

The time and space complexities of the algorithm are dominated by the computation of the two dynamic 
programming matrices v(i,j, k) and Zi(z, j, k). It is obvious that they are all 0{nmt) in the worst case. 

The proof is completed. □ 
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5 Concluding remarks 

We have suggested a new dynamic programming solution for the new generalized constrained longest common 
subsequence problem STR-IC-SEQ-EC-LCS. The first dynamic programming algorithm requires 0{nmst) in 
the worst case, where n, m, s, t are the lengths of the four input sequences respectively. The time complexity 
can be reduced further to cubic time in a more detailed analysis. Many other generalized constrained longest 
common subsequence (GC-LCS) problems have similar structures. It is not clear that whether the same 
technique of this paper can be applied to these problems to achieve efficient algorithms. We will investigate 
these problems further. 
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